-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf(python): Improve performance of indexing operations on Series. #5610
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
github-actions
bot
added
performance
Performance issues or improvements
python
Related to Python Polars
labels
Nov 24, 2022
import numpy as np
import polars as pl
import polars.internals as pli
numpy_array = np.arange(0, 1000000)
series_pl = pl.Series(numpy_array)
numpy_idxs_int64 = np.random.randint(1, 1000000, 10000)
numpy_idxs_uint64 = numpy_idxs_int64.astype(np.uint64)
numpy_idxs_int32 = numpy_idxs_int64.astype(np.int32)
numpy_idxs_uint32 = numpy_idxs_int64.astype(np.uint32)
numpy_idxs_int16 = numpy_idxs_int64.astype(np.int16)
numpy_idxs_uint16 = numpy_idxs_int64.astype(np.uint16)
series_idxs_int64 = pl.Series("idx", numpy_idxs_int64)
series_idxs_uint64 = pl.Series("idx", numpy_idxs_uint64)
series_idxs_int32 = pl.Series("idx", numpy_idxs_int32)
series_idxs_uint32 = pl.Series("idx", numpy_idxs_uint32)
series_idxs_int16 = pl.Series("idx", numpy_idxs_int16)
series_idxs_uint16 = pl.Series("idx", numpy_idxs_uint16)
python_idxs_list = series_idxs_int64.to_list()
# Numpy baseline:
In [3]: %timeit numpy_array[numpy_idxs_int64]
13.8 µs ± 201 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
In [4]: %timeit numpy_array[numpy_idxs_int32]
23 µs ± 134 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
In [5]: %timeit numpy_array[numpy_idxs_int16]
22.6 µs ± 82.2 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
In [6]: %timeit numpy_array[numpy_idxs_uint64]
23.1 µs ± 138 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
In [7]: %timeit numpy_array[numpy_idxs_uint32]
22.9 µs ± 156 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
In [8]: %timeit numpy_array[numpy_idxs_uint16]
21.1 µs ± 48.3 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
# New
In [2]: %timeit series_pl[numpy_idxs_int64]
49.7 µs ± 863 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
In [3]: %timeit series_pl[numpy_idxs_int32]
48.3 µs ± 1.82 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
In [4]: %timeit series_pl[numpy_idxs_int16]
81.8 µs ± 1.45 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
In [5]: %timeit series_pl[numpy_idxs_uint64]
54.8 µs ± 855 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
In [6]: %timeit series_pl[numpy_idxs_uint32]
34.4 µs ± 450 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
In [7]: %timeit series_pl[numpy_idxs_uint16]
27.9 µs ± 402 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
# Old
In [4]: %timeit series_pl[numpy_idxs_int64]
114 µs ± 4.71 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [5]: %timeit series_pl[numpy_idxs_int32]
108 µs ± 1.55 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [6]: %timeit series_pl[numpy_idxs_int16]
142 µs ± 1.04 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [7]: %timeit series_pl[numpy_idxs_uint64]
93.6 µs ± 2.16 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [8]: %timeit series_pl[numpy_idxs_uint32]
71.5 µs ± 2.95 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [9]: %timeit series_pl[numpy_idxs_uint16]
61.9 µs ± 3.55 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# New
In [17]: %timeit series_pl[series_idxs_int64]
61.3 µs ± 176 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
In [18]: %timeit series_pl[series_idxs_int32]
45 µs ± 144 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
In [19]: %timeit series_pl[series_idxs_int16]
210 µs ± 4.01 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
In [20]: %timeit series_pl[series_idxs_uint64]
51.2 µs ± 562 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
In [21]: %timeit series_pl[series_idxs_uint32]
20.1 µs ± 87.1 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
In [22]: %timeit series_pl[series_idxs_uint16]
19.4 µs ± 192 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
# Old
In [10]: %timeit series_pl[series_idxs_int64]
97.7 µs ± 1.33 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [11]: %timeit series_pl[series_idxs_int32]
84 µs ± 647 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [12]: %timeit series_pl[series_idxs_int16]
277 µs ± 2.64 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [13]: %timeit series_pl[series_idxs_uint64]
90.6 µs ± 3.15 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [14]: %timeit series_pl[series_idxs_uint32]
24.5 µs ± 112 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [15]: %timeit series_pl[series_idxs_uint16]
56.1 µs ± 2.39 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# New
In [34]: %timeit numpy_array[python_idxs_list]
399 µs ± 2.48 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
In [35]: %timeit series_pl[python_idxs_list]
513 µs ± 11 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
# Old
In [21]: %timeit numpy_array[python_idxs_list]
336 µs ± 694 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [22]: %timeit series_pl[python_idxs_list]
588 µs ± 6.41 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
# New
In [26]: %timeit numpy_array[100]
53.5 ns ± 0.474 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
In [27]: %timeit series_pl[100]
4.82 µs ± 27 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
# Old
In [23]: %timeit numpy_array[100]
53.5 ns ± 0.299 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
In [24]: %timeit series_pl[100]
6 µs ± 30.7 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
# New
In [28]: %timeit numpy_array[[100, 200]]
528 ns ± 1.62 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
In [29]: %timeit series_pl[[100, 200]]
42.4 µs ± 113 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
# Old
In [27]: %timeit numpy_array[[100, 200]]
572 ns ± 1.29 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [28]: %timeit series_pl[[100, 200]]
108 µs ± 2.36 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) |
ghuls
force-pushed
the
perf_python_getitem
branch
from
November 24, 2022 07:58
940b8a1
to
2a27beb
Compare
Improve performance of indexing operations on Series: - First check for Series and numpy arrays and handle most logic in _pos_idxs(). In case of signed numpy arrays, after converting negative indexes to absolute indexes, convert to unsigned numpy array as it is faster than doing it when creating a new Series with an unsigned dtype from the signed numpy array. - Remove dispatch of cast to expression API as it add around 20 microseconds to each cast call, which is used relatively ofthen in _pos_idxs(). - Move expensive checks on Sequences to the last moment after checking all other instance types. - Move deprecated boolean mask methods to the end as they can be the slowest paths. Speed of a Series.to_frame() with a new name agument is also improved, by renaming the series before converting to a frame (-20 microsecconds).
ghuls
force-pushed
the
perf_python_getitem
branch
from
November 24, 2022 09:19
2a27beb
to
5060b16
Compare
zundertj
pushed a commit
to zundertj/polars
that referenced
this pull request
Jan 7, 2023
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Improve performance of indexing operations on Series:
Speed of a Series.to_frame() with a new name agument is also improved, by renaming the series before
converting to a frame (-20 microsecconds).